[AMD] feat: MiniMax M3 day-zero benchmark for MI325X#1748
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @cquil11's task in 2m 46s —— View job Review: PR #1748 — MiniMax M3 MI325X day-zero benchmark
LGTM — no blocking issues found. Validation summary:
Note (non-blocking): The |
|
Claude finished @cquil11's task in 1m 10s —— View job Review: PR #1748 — MiniMax M3 MI325X day-zero benchmark
LGTM — no blocking issues found. Validation summary:
Note (non-blocking): The |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483150384 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483150384 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27483489611 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5ec3e11. Configure here.
| --no-enable-prefix-caching \ | ||
| --tool-call-parser minimax_m3 \ | ||
| --reasoning-parser minimax_m3 \ | ||
| --enable-auto-tool-choice > "$SERVER_LOG" 2>&1 & |
There was a problem hiding this comment.
Missing FP8 KV cache flag
Medium Severity
The new MI325X vllm serve invocation omits --kv-cache-dtype fp8 even though the PR recipe alignment, changelog, and the existing minimaxm3_fp8_mi355x.sh baseline all specify FP8 KV cache. Without it, vLLM may use a non-FP8 KV default, skewing memory headroom and throughput versus the official MI325X MXFP8 recipe and other MiniMax M3 entries.
Reviewed by Cursor Bugbot for commit 5ec3e11. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27485135330 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487160980 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487160980 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487160980 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27489021974 |
Rebased onto main: MiniMax-M3 MXFP8 MI325X day-zero recipe (script + amd-master entry + perf-changelog + mi325x launcher tuning), plus VLLM_USE_BREAKABLE_CUDAGRAPH=0 so the recipe runs with CUDA graphs. Consolidated the branch's commits onto current main (which now carries the mi300x non-MTP/MTP recipes) to resolve the amd-master/changelog EOF-append conflicts. Co-Authored-By: functionstackx <47992694+functionstackx@users.noreply.github.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
f78392f to
6abc71f
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27487160980 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27491836349 |


Summary
minimaxm3-fp8-mi325x-vllmfor MiniMax M3 MXFP8 on MI325Xvllm/vllm-openai-rocm:minimax-m3and the official MI325X MXFP8 recipe shape/local-nvme/hf-hub-cache/and runtime compiler caches to container-local/tmp/dev/kfdand/dev/driexplicitly for ROCmRecipe Alignment
MiniMaxAI/MiniMax-M3-MXFP8vllm/vllm-openai-rocm:minimax-m3--block-size 128--attention-backend TRITON_ATTN--language-model-only--no-enable-prefix-caching--enforce-eagerworkaroundUpstream reference: https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3?hardware=mi325x&variant=mxfp8
Validation
Representative throughput smoke: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27482912444
/local-nvme/hf-hub-cacheTargeted DPA accuracy validation: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27484953170
0.95750.9568Failure diagnosis:
--kv-cache-dtype fp8produced deterministic repetitive/cross-prompt generation corruption and 1-2% GSM8K. On the same node, image, weights, and layouts, removing only FP8 KV restored correct generation with and without CUDA graphs. The PR therefore leaves KV cache at vLLM's default dtype.Additional validation:
git diff --checkpass/enrootresolves to local NVMe on every healthy compute nodeXDG_CACHE_HOMEandTRITON_CACHE_DIRuse per-job local paths, avoiding stale NFS compiler artifactsFull PR sweep: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27485135330
Changelog Integrity
perf-changelog.yamlis current main byte-for-byte followed only by this PR's entry at the tail.Note
Low Risk
Benchmark and CI runner configuration only; MI325X launcher changes affect cache paths and GPU device visibility but are scoped to the AMD Slurm launch path.
Overview
Adds MiniMax-M3 MXFP8 single-node vLLM benchmarking on MI325X via a new
minimaxm3-fp8-mi325x-vllmmatrix entry, aminimaxm3_fp8_mi325x.shrunner aligned to the official MI325X recipe (vllm/vllm-openai-rocm:minimax-m3, block size 128,TRITON_ATTN, MiniMax parsers, default BF16 KV), and an H200-style search space (TP4/TP8, EP, TP8 DPA) for 1k1k and 8k1k.launch_mi325x-amds.shis updated for all MI325X jobs: Hugging Face hub cache moves from NFS to/local-nvme/hf-hub-cache/, per-jobXDG_CACHE_HOMEandTRITON_CACHE_DIRunder/tmp, and explicit/dev/kfd//dev/drimounts for ROCm in the container.Reviewed by Cursor Bugbot for commit 6abc71f. Bugbot is set up for automated code reviews on this repo. Configure here.